Search CORE

200 research outputs found

NewMadeleine : ordonnancement et optimisation de schémas de communication haute performance.

Author: Brunet Elisabeth
Publication venue: HAL CCSD
Publication date: 01/10/2006
Field of study

National audienceMalgré les progrès spectaculaires accomplis par les interfaces de communication pour réseaux rapides ces quinze dernières années, de nombreuses optimisations potentielles échappent encore aux bibliothèques de communication. La faute en revient principalement à une conception focalisée sur la réduction à l'extrême du chemin critique afin de minimiser la latence. Dans cet article, nous présentons une nouvelle architecture de bibliothèque de communication bâtie autour d'un puissant moteur d'optimisation des transferts dont l'activité s'accorde avec celle des cartes réseau. Le code des stratégies d'optimisations est générique et portable, et il est paramétré à l'exécution par les capacités des pilotes réseau sous-jacents. La base de données des stratégies d'optimisation prédéfinies est facilement extensible

INRIA a CCSD electronic archive server

Support d'ordonnancement et d'optimisation automatisés des communications pour les réseaux hautes performances

Author: Brunet Elisabeth
Publication venue: HAL CCSD
Publication date: 01/07/2005
Field of study

Madeleine 4 est une nouvelle implémentation de l'interface de communication multi-protocole Madeleine. Sa particularité consiste en l'introduction d'une couche générique d'optimisation des paquets de données induite par le découplage du flot d'exécution de l'application de celui des communications. À la manière d'un ordonnanceur de processus, Madeleine 4 a un comportement interne basé sur l'activité des cartes réseaux : lorsqu'une carte est inactive, Madeleine 4 applique des stratégies d'optimisation sur les paquets en attente de transfert en tenant compte des contraintes applicatives et des caractéristiques du réseau rapide sous-jacent afin de choisir la meilleure combinaison de paquets à transmettre sur le réseau. Un premier prototype de Madeleine 4 est implémenté et évalué sur l'interface de communication bas niveau MX/Myrinet

INRIA a CCSD electronic archive server

An analysis of the impact of multi-threading on communication performance

Author: Brunet Elisabeth
Denis Alexandre
Trahay François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/05/2009
Field of study

International audienceAlthough processors become massively multicore and therefore new programming models mix message passing and multi-threading, the effects of threads on communication libraries remain neglected. Designing an efficient modern communication library requires precautions in order to limit the impact of thread-safety mechanisms on performance. In this paper, we present various approaches to building a thread-safe communication library and we study their benefit and impact on performance. We also describe and evaluate techniques used to exploit idle cores to balance the communication library load across multicore machines

INRIA a CCSD electronic archive server

A multicore-enabled multirail communication engine

Author: Brunet Elisabeth
Denis Alexandre
Trahay François
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 01/09/2008
Field of study

International audienceThe current trend in clusters architecture leads toward a massive use of multicore chips. This hardware evolution raises bottleneck issues at the network interface level. The use of multiple parallel networks allows to overcome this problem as it provides an higher aggregate bandwidth. But this bandwidth remains theoretical as only a few communication libraries are able to exploit multiple networks. In this paper, we present an optimization strategy for the NewMadeleine communication library. This strategy is able to efficiently exploit parallel interconnect links. By sampling each network's capabilities, it is possible to estimate a transfer duration a priori. Splitting messages and sending chunks of messages over parallel links can thus be performed efficiently to reach the theoretical aggregate bandwidth. NewMadeleine is multithreaded and exploits multicore chips to send small packets, that involve CPU-consuming copies, in parallel

INRIA a CCSD electronic archive server

HAL Descartes

Short Paper : Dynamic Optimization of Communications over High Speed Networks

Author: Aumage Olivier
Brunet Elisabeth
Namyst Raymond
Publication venue: HAL CCSD
Publication date: 01/01/2006
Field of study

International audienceWe present a new communication subsystem for high speed networks featuring an extendable packet optimization engine mixing several communication flows. Optimizations are parameterized by the capabilities of the underlying network drivers, and are triggered by the network cards when they become idle. The database of predefined strategies can be easily extended

CiteSeerX

INRIA a CCSD electronic archive server

High Performance Code Generation for Stencil Computation on Heterogeneous Multi-device Architectures

Author: Brunet Elisabeth
Li Pei
Namyst Raymond
Publication venue: IEEE Computer Society
Publication date: 13/11/2013
Field of study

International audienceHeterogeneous architectures have been widely used in the domain of high performance computing. On one hand, it allows a designer to use multiple types of computing units and each able to execute the tasks that it is best suited for to increase performance; on the other hand, it brings many challenges in programming for novice users, especially for heterogeneous systems with multi-devices. In this paper, we propose the code generator STEPOCL that generates OpenCL host program for heterogeneous multi-device architecture. In order to simplify the analyzing process, we ask user to provide the description of input and kernel parameters in an XML file, then our generator analyzes the description and generates automatically the host program. Due to the data partition and data exchange strategies, the generated host program can be executed on multi-devices without changing any kernel code. The experiment of iterative stencil loop code (ISL) shows that our tool is efficient. It guarantees the minimum data exchanges and achieves high performance on heterogeneous multi-device architecture

Crossref

INRIA a CCSD electronic archive server

NewMadeleine : ordonnancement et optimisation de schemas de communication haute performance.

Author: Aumage Olivier
Brunet Elisabeth
Namyst Raymond
Publication venue: 'Lavoisier'
Publication date: 01/01/2008
Field of study

INRIA a CCSD electronic archive server

A sampling-based approach for communication libraries auto-tuning

Author: Brunet Elisabeth
Denis Alexandre
Namyst Raymond
Trahay François
Publication venue: HAL CCSD
Publication date: 26/09/2011
Field of study

International audienceCommunication performance is a critical issue in HPC applications, and many solutions have been proposed on the literature (algorithmic, protocols, etc.) In the meantime, computing nodes become massively multicore, leading to a real imbalance between the number of communication sources and the number of physical communication resources. Thus it is now mandatory to share network boards between computation flows, and to take this sharing into account while performing communication optimizations. In previous papers, we have proposed a model and a framework for on-the-fly optimizations of multiplexed concurrent communication flows, and implemented this model in the \nm communication library. This library features optimization strategies able for example to aggregate several messages to reduce the number of packets emitted on the network, or to split messages to use several NICs at the same time. In this paper, we study the tuning of these dynamic optimization strategies. We show that some parameters and thresholds (\rdv threshold, aggregation packet size) depend on the actual hardware, both host and NICs. We propose and implement a method based on sampling of the actual hardware to auto-tune our strategies. Moreover, we show that multi-rail can greatly benefit from performance predictions. We propose an approach for multi-rail that dynamically balance the data between NICs using predictions based on sampling

INRIA a CCSD electronic archive server

HAL Descartes

NewMadeleine: An Efficient Support for High-Performance Networks in MPICH2

Author: Brunet Elisabeth
Buntinas Darius
Mercier Guillaume
Trahay François
Publication venue: HAL CCSD
Publication date: 25/05/2009
Field of study

International audienceThis paper describes how the NewMadeleine communication library has been integrated within the MPICH2 MPI implementation and the benefits brought. NewMadeleine is integrated as a Nemesis network module but the upper layers and in particular the CH3 layer has been modified. By doing so, we allow NewMadeleine to fully deliver its performance to an MPI application. NewMadeleine features sophisticated strategies for sending messages and natively supports multirail network configurations, even heterogeneous ones. It also uses a software element called PIOMan that uses multithreading in order to enhance reactivity and create more efficient progress engines. We show various results that prove that NewMadeleine is indeed well suited as a low-level communication library for building MPI implementations

INRIA a CCSD electronic archive server

NewMadeleine: a Fast Communication Scheduling Engine for High Performance Networks

Author: Aumage Olivier
Brunet Elisabeth
Furmento Nathalie
Namyst Raymond
Publication venue: HAL CCSD
Publication date: 01/01/2007
Field of study

International audienceCommunication libraries have dramatically made progress over the fifteen years, pushed by the success of cluster architectures as the preferred platform for high performance distributed computing. However, many potential optimizations are left unexplored in the process of mapping application communication requests onto low level network commands. The fundamental cause of this situation is that the design of communication subsystems is mostly focused on reducing the latency by shortening the critical path. In this paper, we present a new communication scheduling engine which dynamically optimizes application requests in accordance with the NICs capabilities and activity. The optimizing code is generic and portable. The database of optimizing strategies may be dynamically extended

INRIA a CCSD electronic archive server